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Abstract. In the last years, a large number of RDF data sets has become avail¬ 
able on the Web. However, due to the semi-structured nature of RDF data, missing 
values affect answer completeness of queries that are posed against this data. To 
overcome this limitation, we propose RDF-Hunter, a novel hybrid query process¬ 
ing approach that brings together machine and human computation to execute 
queries against RDF data. We develop a novel quality model and query engine in 
order to enable RDF-Hunter to on the fly decide which parts of a query should 
be executed through conventional technology or crowd computing. To evaluate 
RDF-Hunter, we created a collection of 50 SPARQL queries against the DBpedia 
data set, executed them using our hybrid query engine, and analyzed the accu¬ 
racy of the outcomes obtained from the crowd. The experiments clearly show 
that the overall approach is feasible and produces query results that reliably and 
significantly enhance completeness of automatic query processing responses. 


1 Introduction 

Linked Open Data (LOD) initiatives have fostered the publication of Linked Data (LD) 
on almost any subject 0 . The majority of these data artifacts have been created by 
integrating multiple, typically heterogenous sources, and contain a fair share of missing 
values. Yet, due to the semi-structured nature of RDF data, such incompleteness cannot 
be easily detected, with negative effects on query processing. For example, running a 
query against the DBpedia data sef] that asks for movies, including their producers, 
that have been filmed in New York City by Universal Pictures returns no producers for 
14% of the movies in the result set. With cases like this being a common occurrence 
in LD applications, further techniques are needed to improve this data aspect and sub¬ 
sequent query processing results. Recent research suggests that microtask crowdsourc¬ 
ing provides a platform for implementing effective hybrid approaches to Linked Data 

1 SPARQL endpoint: http://dbpedia.org/sparql 



quality assessment Q. The relational data base community has embraced similar ideas 
to design advanced query processing systems that combine human and computational 
intelligence 1416181101 . However, most of the existing proposals focus on manually 
specifying those parts of the query that should resort to human involvement, typically 
devising bespoke query languages and extensions that would be applied manually on 
top of established data base technology. Such an approach is less feasible for an LD 
scenario, which is confronted with decentralized large volumes of semi-structured data. 
In this work we present RDF-Hunter, the first system that, by exploiting the characteris¬ 
tics of RDF data, automatically identifies the exact portions of a query against an RDF 
data set that should he processed by the crowd in order to augment answer complete¬ 
ness. Going back to our example, RDF-Hunter will assess that the sub-query asking for 
movie producers needs to be outsourced to the crowd in order to collect the missing 
information in the DBpedia data set; it will then autonomously determine how to set up 
the corresponding crowdsourcing task and execute it against a microtask platform. 

In a nutshell, RDF-Hunter is a hybrid query processing system that combines human 
and computer capabilities to run queries against RDF data sets. Its aim is to enhance 
the answer completeness of SPARQL queries by finding missing values in the data set 
via micro task crowdsourcing. Our solution provides a highly flexible crowdsourcing- 
enabled SPARQL query execution: no extensions to SPARQL or RDF are required, 
and the user can configure the level of expected answer completeness in each query 
execution. We define a quality model for completeness of RDF data sets. RDF-Hunter 
implements query decomposition techniques able to on the fly decide the parts of a 
SPARQL query that are potentially affected by missing data values and should resort to 
the crowd. The query engine combines crowd and intermediary automatically computed 
results. During execution time, RDF-Hunter collects information about the types of 
queries the crowd is likely to be able to solve accurately. 

To evaluate RDF-Hunter, we crafted a collection of 50 SPARQL queries against 
DBpedia (version 2014), executed them with our system, and analyzed the quality of 
the crowd answers. The goal of the experiments was to assess the answer completeness 
produced by RDF-Hunter when queries were executed against a SPARQL endpoint and 
the CrowdFlowei^] crowdsourcing platform. The empirical results clearly show that the 
overall approach is not only feasible but can reliably augment response completeness. 

The contributions of our work can be summarized as follows: 

- the design of a quality model for estimating RDF completeness; 

- a proposal for interpreting crowd knowledge; 

- a novel query planner and a query execution engine able to decide which parts of a 
SPARQL query will be executed against an RDF data set and the crowd; 

- an extensive benchmark composed of 50 SPARQL queries to study and evaluate 
the answer completeness of query processing systems; and 

- an extensive empirical evaluation using the DBpedia public SPARQL endpoint and 
the CrowdFlower microtask platform. 


2 http://www.crowdflower.com/ 





2 Related Work 


The database community has proposed several human/computer query processing ar¬ 
chitectures for relational data. Approaches such as CrowdDB mm, Deco 0 , and 
Qurk H target scenarios in which existing microtask platforms are directly embedded 
in query processing systems. CrowdDB mo) provides SQL-like data definition and 
query languages to support hybrid query execution, and attempts to reduce the number 
of tasks to be outsourced by exploiting structural properties of the relational data Col. 
Deco (6) implements caching strategies to reuse previously crowdsourced data. Ad¬ 
ditionally, Deco m and Qurk m provide a set of physical operators and models to 
estimate selectivities and cardinalities. These statistics in conjunction with the physi¬ 
cal operators allow to define physical plans that reduce execution time, monetary cost, 
and number of tasks. By contrast, we propose a quality model that not only exploits 
the structure of RDF data sets, but also values of disagreement and uncertainty of the 
crowdsourced data. Additionally, we devise a query planner that relies on the quality 
model and performs query decomposition techniques able to automatically generate 
data-informed processing pipelines that use crowd intelligence effectively. In this way, 
the query planner makes sure that human contribution is sought only in those cases in 
which it will most likely lead to result improvements. This speeds up the overall query 
execution, and reduces both the costs and the average time that is needed to obtain the 
crowdsourced answers. Further, RDF-Hunter liberates users from the task of manually 
selecting the parts of the queries that will be evaluated by the crowd and the ones that 
will be posed against the RDF engines. 

Additionally, crowdsourcing has also shown to be feasible for other scenarios re¬ 
lated to Semantic Web technologies. Amsterdamer et al. propose OASSIS 0, a query- 
driven crowdsourcing platform that responds to user information needs. OASSIS com¬ 
bines general knowledge from ontologies with frequent patterns mined from personal 
data collected from the crowd. OASSIS provides a SPARQL-like query language where 
users specify sub-queries that will be evaluated against the ontology and the ones that 
will be mined from the crowd. Additionally, the OASSIS query engine is able to order 
the execution of the sub-queries in way that questions posed to the crowd are mini¬ 
mized. LODRefinej^Jan LD integration tool, has made available an extension that allows 
to manually configure and run specific data matchmaking tasks on CrowdFlower. Al¬ 
though these approaches address different LD management problems, they require the 
user intervention on the definition of the crowd-based workflows that will be evaluated 
to solve the corresponding LD management problem. In contrast, RDF-Hunter automat¬ 
ically creates hybrid query-driven workflows, and combines the results obtained from 
both RDF data sets and the crowd to enhance completeness during query processing. 

3 Motivating Example 

Consider the SPARQL query in Listing [Li] to be issued against the DBpedia endpoint. 
This query retrieves information about capitals in Europe and their respective country. 
When executing the query, the total number of answers is 47. This means that DBpedia 

3 http://code.zemanta.com/sparkica/ 






contains 47 entities that are classified as European capitals (triple pattern 4) and that 
are linked to their corresponding country (triple pattern 5). However, by executing only 
triple pattern 4, it is revealed that DBpedia contains 56 bindings for European capitals. 
This suggests that the completeness of this portion of the data set is 0.84. 


Listing 1.1: SPARQL query against DBpedia to select cities and countries such that the 
cities are capitals in Europe 

1 PREFIX dbpedi a—yago : <h11p ://dbpedia . org / class/yago/> 

2 PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> 

3 SELECT DISTINCT ?city ?country WHERE { 

4 ?city a dbpedia-yago : CapitalsInEurope . 

5 ?city dbpedia—owl: country ?country .} 


We crowdsourced the missing values of the previous SPARQL query via microtasks 
submitted to CrowdFlower. Table [T] reports on: (i) the results obtained from the crowd 
(the value for the variable ?country); (ii) the crowd’s confidence, denoted as 7 G [0.0,1.0], 
provided by the platform; and (iii) </>, the normalized average of the familiarity of the 
crowd workers with the topic on a scale from 1 to 7 (which we inquired for). For the 
query from Listing |1.1| the crowd answered that 8 out of the 9 cities are located in a 
country. The crowd submitted answers with high values of confidence, on average 0.89. 
61% of the participants in these tasks claimed that they were familiar with the topic. 

The query presented above illustrates the aspects of SPARQL query execution which 
are likely to benefit most from crowdsourcing: 1) the portion of the RDF data set con¬ 
tains missing values; and 2) the crowd has the skills to complete the missing portions of 
the data sets. These two properties allow to devise effective solutions for crowdsourcing 
query execution that are able to scale to large data sets. A naive approach that submits 
to the crowd every single triple pattern contained in a query is not feasible, since the 
amount of data subject to human assessment (and thus the associated cost) would be 
very large. Therefore, we propose an approach that exploits the structure of RDF data 
sets and information about the crowd to on the fly decide which parts of the query 
require human intervention to be able to scale up to the LOD data sets. 


Table 1: Crowdsourcing results for query from Listing [Tt| 7 G [0.0,1.0] is the crowd’s 
confidence, and </> G [0.0,1.0] is the average of the crowd’s familiarity to the topic. The 
db prefix corresponds to <http://dbpedia.org/resource/> 


Crowdsourced 
instances of ?city 

Crowd’s answers 
for ?country 

7 

<f> 

db:Chi§inau 

db:Moldova 

0.833 

0.476 

db:Edinburgh 

db:Scotland 

1.0 

0.761 

db:EpiskopLCantonment 

db:Akrotiri_and_Dhekelia 

0.833 

0.404 

db:Gibraltar 

db:United_Kingdom 

0.666 

0.69 

db:Helsinki 

db:Finland 

1.0 

0.743 

db:Madrid 

db:Spain 

1.0 

0.976 

db:Pristina 

db: Kosovo 

1.0 

0.714 

db:Vatican City 

db:Vatican City 

0.833 

0.81 

db:Monaco 

(No value) 

0.80 

0.743 
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Fig. 1: The RDF-Hunter architecture 


4 Our Approach 

Figure [T] depicts the components of RDF-Hunter, which receives as input a SPARQL 
query Q and a quality threshold r. The RDF Quality Model estimates the completeness 
of the portions of the data set that yields results for Q. The Query Decomposer generates 
sub-queries from Q , taking into consideration r, the quality model and the human input 
stored in the Interpretations of the Crowd Knowledge. The sub-queries are executed 
by the SPARQL Engine , which contacts the RDF data set and sends the crowdsourced 
RDF triples to the Microtask Manager. The microtask manager generates the human 
tasks and submits them to the microtask platform. The SPARQL engine combines the 
results retrieved from the data set with the human input to produce the results for Q. 

4.1 RDF Quality Model for Completeness 

We propose a model to estimate the completeness of RDF data sets. Our model cap¬ 
tures the multiplicity of predicates at the level of RDF resources and classes. These 
multiplicities are used to compute the completeness of the RDF resources of the data 
set with respect to the predicates that characterize these resources. First, we define the 
multiplicity of a resource s with respect to predicate p , named Mn(s\p ) as the number 
of objects associated with the resource 5 through the predicate p. 

Definition 1. (Predicate Multiplicity of an RDF Resource). Given an RDF resource s 
occurring in the data set D, the multiplicity of the predicate p for the resource s is: 

M D (s\p) := |{o|(s,p,o) e D}\ 

Consider the RDF graph depicted in Figure [2] where ovals represent URIs and rect- 
angles denote literals. Edges correspond to relationships between nodes annotated with 
the corresponding predicate. This RDF graph contains four nodes of type schema.org:Movie. 
In this figure, movies are enclosed with the nodes that represent their producers; multi¬ 
plicity of the predicate db-prop:producer is presented for each movie. For example, the re¬ 
source s = db:The_lnterpreter has three different values for the predicate p = db-prop:producer, 
therefore, Mjj(s\p) is 3 in this case. 

























Next, we define the multiplicity of a class C with respect to a predicate p , named 
AMd(C\p), as the aggregated number of objects associated through the predicate p 
with resources s belonging to the class C. 

Definition 2. (Aggregated Multiplicity of a Class). For each class C occurring in the 
RDF data set D, the aggregated multiplicity ofC over the predicate p is: 


AM D (C\p) := \(F({M D (s\p)\(s,p,o) e D A (s,a,C) e D}))] 

- (s, a, C ) corresponds to the triple (s, rdf:type, C), which means that the subject s be¬ 
longs to the class C. 

- F(.) is an aggregation function, e.g., the median. 

Suppose the class schema.org:Movie comprises only the four movies in Figure [2] and 
the median is the aggregation function. The multiplicity of schema.org:Movie with respect 
to the predicate df-prop:producer, i.e., AM£)(schema.org:Movie|db-prop:producer), is 3. Note that 
movies db:Tower_Heist and db:Non-stop_(fiim) are not considered because they are not related 
to any producer in this data set, and they are not considered in the computation of the 
aggregation function F. 

Finally, the completeness of an RDF resource s with respect to a predicate p is de¬ 
fined as: the result of normalizing the multiplicity of s with respect to p by the maximum 
value of multiplicity of the classes to which s belongs. 


Definition 3. (Completeness of an RDF Resource with Respect to a Predicate.) Given 
an RDF resource s and a predicate p occurring in the data set D. Let C\, ..., C n be the 
classes in D such that (s, a, Ci) G D,..., (s, a, C n ) G D. The completeness of s with 
respect to p in the data set D is defined as follows: 


Comp D (s\p ) 


AMoiflp) AMd(C'\p) 7 ^ 0 
1 otherwise 


where AMd{C'\p ) = max(AM]j(Ci\p ),..., AMd(C u \p )) 

Suppose the movie db:The_lnterpreter also belongs to the class http://dbpedia.org/ontology/Film, 
and the aggregated multiplicity of this class with respect to the predicate db-propproducer is 
5. Then, the completeness of db:The_interpreter is 0.6, indicating that 40% of the producers 
of this movie are not represented in this data set. 



Fig. 2: Portion of the DBpedia data set for movies 



















4.2 Interpretations of the Crowd Knowledge 


We represent the interpretation of the answers provided by the crowd in three knowl¬ 
edge bases modeled as fuzzy sets: KB + , KB~ , and KB~. KB + comprises the triples 
that should belong to the data set, while those that should not exist according to the 
crowd compose KB~\ finally, KB~ contains the associations that the crowd could not 
establish because of a lack of knowledge. For example, a triple in KB + indicates that 
the crowd considers that this triple should be part of the RDF data set, e.g., db:Madrid 
is the capital of db:Spain as reported in Table [T] All the triples in these fuzzy sets are 
annotated with a membership degree m, which states how reliable an answer from the 
crowd is. We have empirically observed that in some cases workers declare to be unfa¬ 
miliar when evaluating some triple patterns, e.g., the country of db:Chi§inau, although the 
platform reported high confidence on this answer (see Table [l}. Therefore, in this work, 
we computed m as the average of the workers’ confidence and normalized familiarity. 

Definition 4. (Interpretations of the Crowd Knowledge). Given D an RDF data set 
and CROWD a pool of human resources. Let D* be a virtual data set such that it is 
composed of all the triples that ‘should ’ be in D. The interpretations of the knowledge 
of CROWD is defined as a 3-tuple: 


KB = (KB + ,KB~,KB~) 


where KB+,KB~,KB~ are fuzzy sets over RDF data composed of 4-tuples (quads) 
of the form (s,p, o, mn) such that: 

- m G [0,1] is the membership degree of the RDF triple (s, p , 6) to the corresponding 
fuzzy set, 

- (s,p, o, m) G KB+ iff o is a constant and according to CROWD (s,p, o) belongs 
to the virtual data set D*, 

- (s,p, o, m) G KB~ iff o is a variable and according to CROWD (s,p, o) doesn't 
belong to the virtual data set D*, and 

- ( s , p, o, m) G KB~ iff o is a variable or a constant, and according to CROWD the 
membership of(s,p , o) to the virtual data set D* is unknown. 

Given an RDF resource s and a predicate p, we are also interested in representing 
the completeness of s with respect to p in the knowledge base KB + . 

Definition 5. (Completeness of an RDF Resource with Respect to a Predicate in KB.) 
Given an RDF resource s and a predicate p occurring in the knowledge base KB. Let 
Ci,..., C n be the classes in the RDF data set D such that (s, a, Cf) G D,..., (s, a, C n ) G 
D. The completeness of s with respect to p in KB is defined as follows: 



- M KB (s\p) = |{o|(s,p,o) e KB+ V ( s,p,o ) e KB V (s,p,o) € KB~}\ 

- AM D (C'\p) = max(AM D (Ci\p),AM D (C n \p)) 



Consider that the movie db:Tower_Heist also belongs to the class http://dbpedia.org/ontology/Film, 
and the aggregated multiplicity of this class with respect to the predicate db-prop:producer 
is 5. Suppose the crowd has declared that this movie is produced by db:Brian_Grazer and 
this fact is part of KB + . Then, the completeness of db:Tower_Heist in KB + is 0.2, indi¬ 
cating that 80% of the producers of this movie have not been collected from the crowd. 

4.2.1 Measuring Disagreement Consider an RDF resource s and predicate p. The 
CROWD disagreement about the (non-)existence of RDF triples with subject 5 and pred¬ 
icate p is defined as follows: 


D(s\p) = 1 — | m + — m | (1) 

- ra + = average({m\(s,p, o, m) G KB+}) 

- m~ = average({m\(s,p, o, m) G KB~ }) 

Disagreement values close to 0.0 indicate high consensus about the (non-)existence of 
a triple in the virtual data set D*. To illustrate this, suppose that CROWD is enquired to 
provide the db-prop:producer for the movie db:Tower_Heist from the data set in Figure |2| and 
the crowdsourced answers are: i) “Brian Grazer is a producer of Tower Heist ”, with 
a membership degree of 0.90, i.e., tl G KB+ with tl =(db:Lower_Heist, db-prop:producer, 
db:Brian_Grazer, 0.90); and ii) “Tower Heist has no producers' ’, with a membership degree 
of 0.05, i.e., £2 G KB~ with t2 =(db:Lower_Heist, db-prop:producer, _:o, 0.05) |^] Suppose that 
tl and t2 are the only triples in KB + and KB~ , respectively. The disagreement in 
CROWD about the producer of this movie is 1 — 10.90 — 0.051 = 0.15. Low disagreement 
suggests that CROWD confirms the (non-)existence of a certain fact. 

4.2.2 Measuring Uncertainty Consider an RDF resource s and predicate p. The 
uncertainty of CROWD about the (non-)existence of RDF triples with subject s and 
predicate p is defined as follows: 

U(s\p) = m~, where ■ avg({m\(s,p , o , mn) G KB ~}) (2) 

Uncertainty values close to 1.0 indicate that CROWD has shown to be unknowledgeable 
about the fact to be vetted. To illustrate this, suppose that when CROWD is inquired 
about providing the producers for the movie db:Non-Stop_(fiim), the obtained answer is t G 
KB~ with t =(db:Non-Stop_(fNm), db-prop:producer, _:oi, 0.97). High uncertainty values indicate 
that CROWD doesn’t have the knowledge to answer this question, and hence it’s not 
useful to further insist on an assessment of this fact from the crowd. 

4.3 Microtask Manager 

The microtask manager creates the human tasks and submits them to the crowdsourc¬ 
ing platform. This component receives the triple patterns to be crowdsourced, with 
bindings produced during query execution. Consider our running example from List- 
m g rm the bindings for ?city obtained when evaluating (?city, a, dbpedia-yago:CapitaisinEurope) 

4 In RDF, existential variables are represented as blank nodes, denoted in this example as _:o. 



are then used to create the microtasks. For each instance, the microtask manager ex¬ 
ploits the semantics of the RDF resources to build rich user interfaces that facilitate 
the worker’s task. For example, an RDF-Hunter human task displays ’Madrid” instead 
of http://dbpedia.org/resource/Madrid. Providing details like these in microtasks has proven to 
assist the crowd in effectively providing the right answer (T). 

In addition, the RDF-Hunter human tasks contain two types of questions. The first 
one is related to the existence of a value for the crowdsourced triple pattern. For ex¬ 
ample, for the triple pattern t =(db:Madrid, dbpedia-owkcountry, ?country) the task displays: 
“Does Madrid have a country?”. We devise three possible answers here, such that t 
can be directly mapped into the crowd knowledge bases: “Yes” G KB + , “No” ^ 
t G KB~ , and “Not sure” ^ t G KB~ . When the answer is “Yes”, a second question 
requires the crowd to provide a specific value, for example, “What is the country of 
Madrid?”. The microtask manager aggregates the outcome of the tasks and stores them 
as RDF triples annotated with the corresponding membership degree (m). 

4.4 Query Decomposer & SPARQL Query Engine 

The RDF-Hunter query decomposer automatically identifies those parts of a query Q 
that will be handled through human computation (Scrowd) and those executed against 
the RDF data set (Sd)- The decomposer performs three main steps: 1) partitioning of 
Q into Sd and Scrowd, 2) generation of the set SQd of sub-queries to be executed 
against the RDF data set; and 3) generation of the set SQcrowd of sub-queries to be 
posed against the crowd. The decomposer proceeds as follows: triple patterns with con¬ 
stants in the predicate and object positions are added to Sd, while those with variables 
in the subject and object positions are added to Scrowd - For example, in the SPARQL 
query from Figure [3} triple patterns tl, t3 , and t4 are inserted in Sd , and t2 is added 
to Scrowd- Next, the triple patterns in Sd are partitioned into sub-queries in a way 
that all the triple patterns that share the same subject variable are assigned to the same 
sub-query. Similarly, triple patterns in Scrowd are grouped into sub-queries. In our 
running example, triple patterns tl , t3 , and t4 share the variable ?movie, hence they com¬ 
pose one sub-query in SQd', further, t2 belongs to the only sub-query in SQ crowd- 

The RDF-Hunter query engine executes SPARQL queries both against RDF data 
sets and the crowd to augment answer completeness. We propose an efficient algorithm 
(Algorithm [I} that receives the decomposition of the query triple patterns into the sets 
SQd and SQ crowd, and a threshold r and outputs the answer of the query. The 
algorithm generates left-linear plans where the most selective sub-queries are executed 
first, the number of joins is maximized (lines 1, 5, and 8), and Cartesian products are 
executed at the end (line 11). Thus, intermediate results Q and the number of human 
tasks are minimized. Figure [3] illustrates a query plan where the sub-query (tl, t3 , and 
t4) against the RDF data set D is executed first; the intermediate results Q of this sub¬ 
query are used to instantiate the triple pattern t2 that will be crowdsourced. 

Given a triple pattern t in a sub-query pi assigned to the crowd, the algorithm checks 
if the instantiation of t with the mappings from Q can be evaluated against D and KB + , 
and produce complete results (line 22). If D and KB + are not complete (lines 23-24), 
the algorithm verifies if the crowd could potentially collect a complete answer, i.e., it 
checks if P(/jl(s) , p) > r holds. The threshold r is provided by the user, and P(/jl(s) , p) 


Query Plan 


Input: t=0.60 and query Q 

SELECT ?movie ?producer WHERE { 

(D?movie rdf:type :Movie . 

©?movie db-prop_producer ?producer. 

©?movie dc:subject dbpedia:Category:Universal_Pictures_films . 
(J)?movie dc:subject dbpedia:Category:Films_shot_in_New_York_City.} 

KBs 


[KB + j {(db:Tower_Heist, db-prop:producer, dbpedia:Brian_Grazer, 0.90)} 
^<i^ {(db:Tower_Heist, db-prop:producer, _:o1,0.05)} 
H<B~]{(db:Non-Stop_(film), db-prop:producer, _:o2, 0.97)} 





Intermediate Results 

Q = {{movie: dbpedia:Tower_Heist}, 
{movie: dbpedia:The_lnterpreter}, 
{movie: dbpedia:Legal_Eagles}, 
{movie: dbpedia:Non-Stop_(film)}, 
...} 


Fig. 3: Example of the execution of the RDF-Hunter decomposer and query engine 


corresponds to the probability of crowdsourcing the evaluation of t where /x(s) and p 
are in the subject and predicate positions of t , respectively: 

P(/j,(s),p) = a • (1 - Comp(s\p)) + (1 — a) • T(D(s\p),U(s\p)) (3) 

- a E [0.0,1.0] is a score to weight the importance of the data set completeness and 
the crowd knowledge, 

- Comp(s\p) is defined as Comp B {s\p) + Comp KB +(s\p ), 

- D(s\p) and V (s|p) correspond to the disagreement and uncertainty levels, 

- T is a T-norm function to combine the values of disagreement and uncertainty. We 
compute T as the Godel T-norm, also called Minimum T-norm, which represents 
a weak conjunction of fuzzy sets. Since the system aims at crowdsourcing triples 
with high levels of disagreement but low uncertainty, we have applied the Godel 
T-norm as follows: T(D(s\p),U(s\p)) = min{2}(s|p), 1 — U(s\p)} 

If P(fl(s),p) > r holds, the algorithm invokes the microtask manager which creates 
the corresponding microtasks and submits them to the crowdsourcing platform. The 
algorithm terminates when all the sub-queries are evaluated and produces 12. 

We illustrate the behavior of Algorithm [T] (lines 21-25) when AM^(schema.org:Movie| 
db-prop:producer)=3, the plan and intermediate results 1? from Figure [3] are considered. 

Iteration 1: The algorithm selects the first element of the intermediate results 17, 
fl{ movie) = db:Tower_Heist. Given that M^>(db:Tower_Heist|db-prop:producer) = 0 (see Figure [2]) and 
Mrb+ (db : Towe r - H eist|db-p r op:p r oduce r ) = 2 the completeness of fjb(s) w.r.t. p is 0.33. The 
algorithm then computes the probability of evaluating the triple pattern (db:Tower_Heist, db- 
prop:producer, ?producer) against the crowd (line 23). The crowd knowledge bases KB + and 
KB~ have information about this triple pattern, and applying Equation[l] the algorithm 
obtains that D(p(s)\p) = 0.15. Notice that this triple pattern is not in KB~, hence 
U(p(s)\p) = 0. The result of applying Equation [3] is P(/i(s),p) = 0.5 • (1 — (0 + 
0.33)) + 0.5 • mm({0.15,1 — 0}) = 0.41. This value is lower than r = 0.60, then this 
pattern is not submitted to the crowd. 

Iteration 2: The next instance is //(movie) = db:The_interpreter. According to Figure [2] 
Af/)(db:The_interpreter|db-prop:producer) = 3, then the completeness of p(s) w.r.t. p in the data 
set is 1.0. Since the values of p for fi(s) are complete, according to the algorithm on 
line 22, this triple pattern is not crowdsourced. 



Algorithm 1: RDF-Hunter Query Execution Algorithm 


Input: RDF data set D, a descomposition ( SQn , SQcrowd), KB = (KB + , KB~ , KB~), and r. 
Output: The query answer i7. 

1 plan <— sq // Where sq is a sub-query from SQn with the highest selectivity 

2 SQ' d <— SQn 

3 SQn SQn — {sq} 

4 // (1) Plan generation 

5 while SQn U SQcrowd 7 ^ 0 do 

6 Select sq' from SQcrowd such that sq' shares a variable with plan 

7 SQcrowd SQcrowd — {sq 7 } 

8 plan <— plan U {sq'} 

9 Select from SQn the sub-query sq with the highest selectivity and that shares a variable with plan 

10 SQn SQn — {sq} 

11 plan <— plan U {sq} 


12 plan plan U SQcrowd U SQn 

13 // (2) Execution of the plan 

14 Q 0 // Intermediate results 

15 for pi G pZan do 
if pi G SQo then 

pi’ instantiate(pi , 17) 
17 <— 17 cxi 


19 

20 
21 
22 

23 

24 

25 


else 

^ ^ ^ar(pi)n var(f2)^ 

for t = (s, p, 0 ) G pi do 

for p G 17' do 

if Compn(^(s) |p) + Comp KB + (p(s) |p) < 1.0 then 
if P(p(s),p) > r then 

Invoke the Microtask Manager with (p(s), p, o) 


26 


17 ^ 17 n ([[(p(s),p, o)]] D U [[(p(s),p, o)]] KB+ ) 


27 return 17 


Iteration 3: The algorithm processes p( movie) = db:Legal_Eagles, with M£>(db:l_egal_Eagles|db- 
prop:producer) = 2 (see Figure^. The completeness of p(s) w.r.t. p is 0.667, which suggests 
that this triple pattern might be subject to crowdsourcing (line 22). The crowd knowl¬ 
edge bases don’t have information about this triple pattern, therefore ra + = 0, m~ = 0, 
m~ = 0 ; the disagreement and uncertainty are D(p(s)\p) = 1.0 and V (/x(s)| p) = 0 . 0 , 
respectively. Applying Equation [3] the algorithm obtains that P(p(s),p) = 0.5 • (1 — 
(0.667 + 0)) + 0.5 • mm({1.0,1 — 0.0}) = 0.667 (line 23). This value is higher than 
r = 0.60, in consequence, this pattern is submitted to the crowd. 

Iteration 4: The next instance is //(movie) = db:Non-stop_(fiim), with M^(db:Non-stop_(fiim)|db- 
prop:producer) = o (see Figure [2]), hence the algorithm checks whether this triple pattern is 
crowdsourced (line 22). KB + and KB~ don’t have information about this triple pat¬ 
tern, therefore, D(/j,(s)\p) = 1.0. However, KB~ states that this triple pattern has 
uncertainty U(p(s)\p) = 0.97. The algorithm computes: P(/j,(s),p) = 0.5 • (1 — (0 + 
0)) + 0.5 • ram({l, 1 — 0.97}) = 0.515, therefore, this pattern is not crowdsourced. 

The algorithm then joins the intermediate results Q with the corresponding in¬ 
stances from the data set D and the crowd knowledge KB + (line 25). Considering 
the RDF data set from Figure [2j the evaluation of the running query with traditional 
SPARQL engines yields only 5 results: 3 producers for db:The_iinterpreter and 2 producers 
for db:Legai_Eagies. With RDF-Hunter, the query answer in addition contains the producer 












for the movies db:Tower_Heist, and the potential answers of evaluating against the crowd 
the triple pattern (db:Legai_Eagies, db-prop:producer, ?producer) as identified in Iteration 3. 


5 Experimental Study 

We conducted an empirical evaluation to assess the effectiveness of RDF-Hunter to aug¬ 
ment the answer completeness of SPARQL queries via microtasks. Below we describe 
the configuration settings used in our experiments. 

Query Benchmark: We created an extensive benchmark of 50 querie^Jby analyzing 
sub-queries answerable for the DBpedia SPARQL endpoint; we designed queries that 
yield incomplete results, varying the percentage of result set completeness from 0.03 to 
0.92 (see Table [2]). To test the knowledge of the crowd, we crafted 10 queries about dif¬ 
ferent topics in five domains: Life Sciences , History , Music , Sports , and Movies. Queries 
are composed of basic graph patterns of between three and six triple patterns. 

Gold Standard: We created a gold standard of the form (i triple pattern , answers). For 
each triple pattern in the benchmark queries, we retrieved the answers produced by 
the endpoint. When RDF-Hunter decides to submit a triple pattern t to the crowd, a 
triple pattern from the gold standard with the same predicate of t is crowdsourced. The 
answers from the crowd are then compared to answers to compute accuracy. 
Evaluation Metrics: i) Precision (P): Given a query q , precision measures the fraction 
of the answers collected from the crowd during the hybrid evaluation of q that actually 
correspond to answers of q\ values of precision close to 1.0 show that the crowd outputs 
correct answers for q. ii) Recall ( R ): Given a query q , recall measures the fraction of the 
missing answers of q that are collected from the crowd during the hybrid evaluation of q\ 
values of recall equal to 1.0 indicate that the crowd is able to produce all of the missing 
answers of q. iii) F-Measure: Combines the values of precision and recall to measure 
the accuracy of the crowd output; the F-Measure is computed as follows: 2 • p + ^ . 
Implementation: RDF-Hunter is implemented in Python 2.7.6. and CrowdFlower is 
used as the crowdsourcing platform. We set up RDF-Hunter with r = 0.02, a = 0.5. 
The crowd knowledge bases KB + , KB ~, and KB~ were initially empty. 
Crowdsourcing Configuration: i) Task granularity: In each task, we asked the workers 
to solve a maximum of four different questions; each question corresponds to one RDF 
triple, ii) Payments: The monetary reward was fixed to 0.07 US dollars per task, i.e., we 
paid 0.0175 US dollar per RDF triple, iii) Judgments: We configured the CrowdFlower 
platform to collect at least three answers for each question, iv) Gold Units: Correspond 
to verification questions used by CrowdFlower to filter low-quality workers. In this 
work, the gold units were generated from the gold standard. The gold unit distribution 
was set to 10:90, i.e., for each 100 triples in the gold standard, 10 were gold units. 

We executed the benchmark queries with RDF-Hunter and crowdsourced a total of 
502 RDF triples. We collected 1,619 answers from the crowd (see Table[2]). 

5.1 Results: Accuracy of Crowd Answers 

We report on results of precision and recall using heat maps (see Figure]?]). The darkest 
color represents values of precision or recall equal to 1.0. Columns correspond to the 

5 Queries are available at http : / /people . aifb . kit. edu/mac/rdf-hunter 




Table 2: Results when executing the benchmark with RDF-Hunter 


Knowledge 

Domain 

Result Set 
Completeness 

(min; max) 

# Crowdsourced 
RDF Triples 

(w/o Gold Units) 

# Total Crowd 
Responses 

(w/o Gold Unit Responses) 

F-Measure 

Life Sciences 

(0.03; 0.92) 

82 

250 

0.96 

History 

(0.04; 0.91) 

160 

476 

0.89 

Music 

(0.36; 0.80) 

71 

204 

0.84 

Sports 

(0.25; 0.86) 

69 

199 

0.91 

Movies 

(0.46; 0.89) 

120 

490 

0.95 

Total 

- 

502 

1,619 

- 


five knowledge domains, while rows represent the benchmark queries. Altogether, the 
crowd was able to respond to 21 out of 50 queries with both precision and recall equal 
to 1.0, and the crowd achieved accuracy values ranging from 0.84 to 0.96 (see Table[2]). 

Figure 4(a) reports on the values of recall. We can observe that in 41 out of 50 
queries, the crowd was able to answer all the missing values. Furthermore, for 48 
queries the achieved recall is greater than 0.77. The only two cases where the crowd 
achieved low recall are in queries Music Query 2 and Movies Query 4. The questions for these 
queries were: “What is the name of an American blues musician ?” and “What is the 
gross of a moviel ”, respectively. Although the crowd showed to be skilled in these two 
domains in general, there are predicates within these domains where the crowd doesn’t 
exhibit high levels of expertise. This observation provides evidence of the importance 
of the RDF-Hunter triple-based approach on the identification of portions of the do¬ 
main where the crowd is knowledgeable. Thus, in subsequent requests, RDF-Hunter 
will exploit this knowledge to avoid crowdsourcing these two questions again. 

The values of precision are reported on Figure |4(b)| The crowd was able to provide 
correct answers for 22 out of 50 queries. Furthermore, the crowd achieved a precision 
greater than 0.75 in 35 queries. The heat map clearly shows the heterogeneity of the 
level of crowd expertise, and similarly to recall, precision values support the importance 
of the RDF-Hunter triple-based approach. 

5.2 Results: Crowd’s Confidence & Familiarity 


The values of crowd’s confidence and familiarity are used by RDF-Hunter to annotate 
the triples retrieved from the crowd. These annotations represent the membership degree 
(m) of the triples to each of the crowd knowledge bases, and guide the RDF-Hunter 
execution algorithm to devise an effective query evaluation. RDF-Hunter captures the 
crowd’s confidence provided by CrowdFlower as worker trust , which suggests how 
reliable the answer provided by a worker is. In addition, the crowd’s familiarity with 
the topic is collected via required questions embedded in each microtask. 

The average and standard deviation values of the crowd’s confidence obtained from 
CrowdFlower are: (Life Sciences, 0.92 =b 0.07), (History, 0.93 =b 0.07), (Music, 0.95 d= 
0.07), (Sports, 0.94 =b 0.07), (Movies, 0.94 d= 0.07). The crowd’s confidence on average 
is very high indicating that the majority of the crowd answers are reliable. However, the 
homogeneity of these results suggests that considering only the crowd’s confidence is 
not enough to model the membership degree m of the crowd answers. 
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Fig. 4: Heat maps of recall and precision achieved with RDF-Hunter 


Figure [5] reports the histograms for familiarity scores in a scale from 1 to 7. The 
familiarity values provide insights on how difficult the workers perceive a given task. 
In our experiments, according to the reported familiarity, the topics with higher famil¬ 
iarity were Sports and Movies. The triples retrieved from the crowd in these topics are 
annotated with high values of m. On the other hand, the more challenging domain was 
Life Sciences, indicating that some crowdsourced answers in this domain are annotated 
with lower values of m in comparison to Sports or Movies. Thus, values of familiarity 
combined with crowd’s confidence allow for more faithful values of membership m. 

6 Conclusions and Outlook 

We defined RDF-Hunter, the first hybrid engine for SPARQL query answering with hu¬ 
man computation. RDF-Hunter supports crowdsourcing to enhance the completeness of 
Linked Data sets at query processing time. Both the quality model and the novel query 
engine enable RDF-Hunter to automatically decide which parts of a SPARQL query 
should resort to human computation, according to the data set quality. We designed an 
extensive query benchmark of 50 queries over the DBpedia data set. Empirical results 
confirm that hybrid human/computer systems can effectively increase the completeness 
of SPARQL query answers. RDF-Hunter achieved F-measure values ranging from 0.84 
to 0.96. In the future, we will focus on optimizations such us batching (5j and prioritiza¬ 
tion m of human computation tasks to provide a more efficient, adaptive, and scalable 
RDF-Hunter query engine. 
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Fig. 5: Histogram and trend line of the crowd’s familiarity with tasks from five different 
knowledge domains: Life Sciences, History, Sports, Movies, and Music. The x axis 
represents the familiarity score from 1 to 7; the y is the proportion of workers 






